Lambda.subst: also update debug event environments #1751

trefis · 2018-05-01T10:59:01Z

https://caml.inria.fr/mantis/view.php?id=7554

@alainfrisch , @gasche : any opinion on the approach proposed here?

Testing

$ cat pr7554.ml
let () =
  let x, y, z = (List.length [], 3.14, "haha") in
  Printf.printf "%d - %.0f%%) : %S\n"
    x y z
$ ocamlc -g ./pr7554.ml -o pr7554.exe
$ ocamldebug ./pr7554.exe
	OCaml Debugger version 4.08.0+dev0-2018-04-09

(ocd) break @Pr7554 2
Loading program... done.
Breakpoint 1 at 131584: file ./pr7554.ml, line 2, characters 18-32
(ocd) run
Time: 18 - pc: 131584 - module Pr7554
Breakpoint: 1
2   let x, y, z = (List.length []<|a|>, 3.14, "haha") in
(ocd) n
Time: 19 - pc: 131600 - module Pr7554
3   <|b|>Printf.printf "%d - %.0f%%) : %S\n"
(ocd) p x y z

Without the changes proposed here the result is:

Unbound identifier x

With the changes:

x: int = 0
y: float = 3.14
z: string = "haha"

Notes about the implementation

This is a proof of concept.
Things that could/should be tried before merging:

it seems sad to go over the lambda twice: once to substitute the subterms, once to update the environments, it's probably possible to do both in a single pass
I'm not sure simplify_exits is the only place which introduces new bindings, any other place doing so should probably be updated in the same way.

gasche · 2018-05-01T19:24:45Z

Very naive question: couldn't Lambda.subst just do the right thing on Levent nodes?

trefis · 2018-05-02T10:33:09Z

As I said:

This is a proof of concept. [...] it seems sad to go over the lambda twice: once to substitute the subterms, once to update the environments, it's probably possible to do both in a single pass

But to answer more precisely: I'm not sure Lambda.subst is the right place. It substitutes idents for arbitrary terms, and it feels a bit weird to add "and also does arbitrary things to the environment of of debug event".
Here we're in a particular case where we just substitute idents. It might make sense to write a specialized version of Lambda.subst that does that plus the environment update.
Or perhaps we could update simplif to update the environment itself during its recursion.

gasche

What the theory tells us is that a substitution by an environment env goes from an enviromnent E1 to an environment E2 if, for all x \in E1, the term env(x) is well scoped in E2. Then, for any term t that is well-scoped is E1, subst env t is well-scoped in E2. The variable-renaming cases corresponds to the case where env(x) is also a variable, and then has to belong to E2. (In the actual implementation, we only give a fragment of the environment with the notion that variables not in the fragment are unchanged, so we rather go from E,E1 to E,E2 for a common, untouched environment E).

A general way to make subst do the right thing on environments would be to add a parameter of type variable -> value_description -> env -> env, which for a given variable x : vd and result environment E2, extends E2 to contain the environment necessary for env(x). In particular, for a variable->variable renaming the function would just add the renamed binder to the input environment, and for substitutions that turn a variable into a closed term the function would return the input environment unchanged.

In the implementation of Lambda.subst, an auxiliary function of type

(var -> vd -> Env.t -> Env.t) -> _ Ident.Map.t -> summary -> summary

would be handy.

Re. potential specializations, I looked at uses of Lambda.subst, they seem to all be one of the two forms:

variable->variable renamings
substituting a variable with a closed term (in translclass and translmod),
or at least a term with free variables present only in the unchanged, common environment
(neither in the support of the env fragment, nor new variables)

This suggests that having two specialized traversals would also work -- but with more code duplication in Lambda.

gasche · 2018-05-02T12:33:00Z

bytecomp/lambda.ml

@@ -263,7 +263,7 @@ and lambda_event =
  { lev_loc: Location.t;
    lev_kind: lambda_event_kind;
    lev_repr: int ref option;
-    lev_env: Env.summary }
+    lev_env: Env.t }


Is this a wise choice? I thought that Env.summary was designed precisely to be the "compact" part of the environment for use in debug information.

I'm not sure where you got that from, but you probably have better sources than I do. I'm also not sure what you're worried about: we still "summarize" these environment when transforming lambda to the next AST, they are never marshalled or anything.
Considering we still have modifications to do on the environment at this point, I think it make more sense to actually modify Env.ts than the summaries.

One wouldn't think of looking there but I got it from env.mli :-)

Indeed, I hadn't looked there.
Then again, depends how you interpret it. Does "compact" refer to the marshalled representation that end up in .cmo files (or similar), or to something else? My bet is on the former, in which case I believe my change is fine.

trefis · 2018-05-04T15:10:51Z

I update Lambda.subst to take an extra parameter which it calls to update debug events environments.
The current change is more invasive than the original version and while it seems correct to me, and the test described above still works, I haven't tested it extensively.

gasche

I like your last commit, which is not terribly surprising as it is close to my substitution-and-scoping rants above. I would encourage you to rebase eventually to remove the previous, more hackish approach from the patchset.

I am still not convinced by the Env.summary => Env.t change in lambda. I think of Env.summary as "inspectable data" and Env.t as "opaque stuff", so to me it seems proper to use the former as part of an intermediate representation such as Lambda.

Is there something problematic in writing an auxiliary function that takes an Env.t -> Env.t transformation, and turns it into an Env.summary -> Env.summary transformation? This seems easy to do from the API, is there some efficiency concern?

gasche · 2018-05-05T12:55:16Z

bytecomp/lambda.ml

@@ -590,7 +590,7 @@ let rec make_sequence fn = function
   Assumes that the image of the substitution is out of reach
   of the bound variables of the lambda-term (no capture). *)

-let rec subst s lam =
+let rec subst update_env s lam =


The update_env function remains constant over the traversal, so it doesn't need to be a parameter to inner recursive calls. Could you maybe wrap the mutual recursion under a toplevel let subst update_env s lam = function, so that the patch is less invasive and the code lighter? (You still need an indentation bump.)

gasche · 2018-05-05T12:55:47Z

bytecomp/lambda.mli

@@ -334,7 +334,7 @@ val transl_class_path: ?loc:Location.t -> Env.t -> Path.t -> lambda

 val make_sequence: ('a -> lambda) -> 'a list -> lambda

-val subst: lambda Ident.Map.t -> lambda -> lambda
+val subst: (Ident.t -> Types.value_description -> Env.t -> Env.t) -> lambda Ident.Map.t -> lambda -> lambda


Maybe you could extend the comment with a word on what the new parameter does?

Will do once we settle on what it should actually do.

gasche · 2018-05-05T12:58:01Z

bytecomp/simplif.ml

-  )
+let update_env idmap oldid vd env =
+  let newid = Ident.Map.find oldid idmap in
+  Env.add_value newid vd env


This toplevel definition feels a bit lost, and then the use-sites duplicate idmap-building logic. Why don't you just expose a variant of subst called rename, here or directly in Lambda, that takes a list of (old_id, new_id) pairs?

gasche · 2018-05-05T13:02:32Z

bytecomp/lambda.ml

+        Ident.Map.fold (fun id _ env ->
+          match Env.find_value (Path.Pident id) evt.lev_env with
+          | exception Not_found -> env
+          | vd -> update_env id vd env


I think that in this branch update_env should be called on a version of env where the id binding has been removed. Otherwise there is no way to remove identifiers from the environment when performing a substitution.

gasche · 2018-05-05T13:03:43Z

bytecomp/translmod.ml

@@ -839,21 +852,22 @@ let field_of_str loc str =


 let transl_store_structure glob map prims str =
+  let no_env_update _ _ env = env in


If the implementation of substitution on environments is adapted to remove bindings before update_env is called, then this function could/should renamed into remove_from_env, and I believe this would be the correct behavior for the use-sites below.

trefis · 2018-05-07T23:16:32Z

I am still not convinced by the Env.summary => Env.t change in lambda. I think of Env.summary as "inspectable data" and Env.t as "opaque stuff", so to me it seems proper to use the former as part of an intermediate representation such as Lambda.

And I myself remain completely unconvinced by that argument. Sorry.

Is there something problematic in writing an auxiliary function that takes an Env.t -> Env.t transformation, and turns it into an Env.summary -> Env.summary transformation? This seems easy to do from the API, is there some efficiency concern?

There is no difficulty: Envaux.env_of_only_summary does the conversion in one direction, Env.summary in the other. But I'm indeed worried about the performance, the first of these two functions seems quite costly.

But regardless of the potential cost, I just disagree with your opinion regarding the use of Env.t vs Env.summary. (I'm repeating that point so as to not get dragged into a discussion about the actual impact on performance of these operations)

I think that in this branch update_env should be called on a version of env where the id binding has been removed. Otherwise there is no way to remove identifiers from the environment when performing a substitution.

Not that it should matter, but there is currently no way to remove anything from the environment, ever.
I'm not quite sure we actually always want that behavior (though I suspect you're right), I will think about it tomorrow (i.e. tomorrow I will try and understand what substitutions translmod applies).

trefis · 2018-05-08T20:01:52Z

To comment on this:

I'm not quite sure we actually always want that behavior

I was wondering about the impact of these substitutions (the ones done in translmod that is) on "debugability", but given that the substitutions are done then I agree a correct semantic would be to remove the bindings from the environments while performing the substitutions.
It's worth noting that these substitutions are done only with the non-flambda native backend (be it normal compilation or from the native toplevel), both the bytecode compiler and the native compiler with flambda enabled introduce local aliases before doing a simple renaming (i.e. Lambda.rename) which seems much better for debugging.

I'm running a bit out of steam on this PR: I thought it would be a quick and easy (and gratifying, and whatever else you can think of!) fix that would distract me for one evening of the things I usually do; I should have known better :p
(Just to be clear: I don't mean to complain about the review you've done on this Gabriel, I think that thanks to your comments the PR is in a better state that it initially was. I just have other things to do at the moment)

Here is I believe, a summary of the state of this PR:

it seems that we disagree about the change putting Env.ts instead of Env.summarys inside lambda_events.
Lambda.subst should remove values from the environment before calling its update_env parameter. However:
- the fact that it doesn't is AFAICT not observable from any of the debuggers currently available (that might not be true if something like Make OCaml native debugging awesome #574 ever lands, but I'm not really holding my breath).
- there is currently no helper to remove a binding from the environment. Writing one seems reasonably easy (though quite ad-hoc), but might have a negative impact on compilation time.

While I feel guilty for saying that, I don't think point 2 is a blocker and I feel like we could merge even without handling it (we're not introducing a regression, we're "only" half-fixing the issue).
Point 1 seems like it might be a blocker for you, but unless another person/argument appears to change my mind, I don't think I'll do any work on it.

So it seems that we have two options here (if we don't want this PR to rot away for months):

if you care enough about these issues, and have enough time to work on them, then I can close this PR and you pick up from here and do the changes you proposed as well as some benchmarking to confirm we're not degrading compilation time too much
or I rebase this on a recent trunk, clean up the commit history, and we merge it

I'd be happy with either of these two options (and somewhat sad if neither was taken...)

What do you think?

gasche · 2018-05-08T20:37:50Z

I think that your proposals for both (1) and (2) are reasonable and pragmatic. Re. (1), I agree with your point about efficiency concerns, and with the idea that consumers could always perform summarization locally.

(We could think of adding a type parameter to Lambda.t that is either Env.t or Env.summary, to keep exporting summaries to the outside but have environments for internal transformations. But the change to all lambda-using interfaces may make the curse worse than the disease.)

I would find it reassuring to have a comment from someone that knows when and how summarization really matters.)

trefis · 2018-05-08T20:48:32Z

I would find it reassuring to have a comment from someone that knows when and how summarization really matters

My intuition is that we need to summarize when we serialize (which happens for debug events, but also for cmts).
This might have initially been because of size concerns (again, my intuition would be that the summary is more compact than the whole env) and is currently mandatory because of the environment contains functional values which can't be marshaled.

I somewhat doubt that there ever was another reason, but I'm only guessing.

trefis · 2018-05-14T21:34:12Z

I have rebased and cleaned up the history.

Given that there has been no comment for the past week regarding the Env.t vs Env.summary discussion, and that we can always revisit that particular change later if someone wakes up to shed some light on the topic, I am tempted to merge this in trunk sometime in the next few days.

gasche · 2018-05-14T21:49:22Z

@xavierleroy: is there any reason not to change the payload of Levent from Env.summary to Env.t, as long as we convert back to summary before any serialization, in particular when producing bytecode?

(To summarize the discussion above: the reason to use Env.t instead of Env.summary is to be able to replace bindings when performing subtitutions, which would take linear time on summaries.)

damiendoligez · 2018-05-16T15:07:39Z

I dug up an old discussion on caml-devel (from 2012!) about environment summaries. The gist is that summaries are used in debugger events (in byte code executables) and .cmt files instead of full environments, in order to keep the files reasonably small.

The main protagonists were @lefessan and @xavierleroy. Any opinions?

xavierleroy · 2018-05-16T15:18:41Z

Summaries (Env.summary) are much more compact than environments (Env.t), by factors of 100 or more. That's why the original code drops Env.t in favor of Env.summary as soon as the full Env.t are no longer needed, i.e. when leaving the typechecker, keeping in Lambda and below the strict minimum Env.summary needed for debugging information. It does make me nervous to keep Env.t live longer than that.

trefis · 2018-05-17T15:03:38Z

Thanks for the context!

I'm still in favor of keeping the change from summaries to full env in lambda, for the same reasons as previously. That is: Env.t still do not make it to build artifacts, and we're still modifying the environment at that point, so it makes more sense (and is more efficient) to work on the actual environment.

I had a look to see if that last argument could be used to justify pushing Env.t all the way through to the bytecode, and I do not believe so: there are a bunch of calls to Ident.create and Ident.rename in Transl*, Matching and Simplif, but there doesn't seem to be anything after that.
(well, except when we rename globals when building a packed module, but I don't really know what to make of that)

gasche · 2018-05-21T07:15:12Z

After thinking more about the issue, I have come to agree with @trefis here: having Env.summary in lambda trees means that we cannot implement substitution correctly over lambda, and that is a highly undesirable property for an intermediate representation.

It is important that summary is used for anything serialized, and the current PR ensures this. The concern over compactness also comes from memory usage during compilation. Maybe this could be alleviated by some compilation performance testing? If no important degradation is found, I would be in favor of the PR as it is today.

If we wanted to improve on this, several choices:

What makes Env.t non-compact is partly the arborescent/mapping structure, and partly due to information that made sense during type-checking but shouldn't be useful after that (callbacks for usage detection and unused warnings, etc.). It could be possible to have a new intermediate representation of environment that keeps the mapping structure, but keeps only the data that is useful for a now-fully-typed representation.
As a simpler thing, maybe there is a way to process an Env.t structure to remove the data justified for type inference in a systematic way, without changing the type. (Putting opened = None in IdTbl tables, for example). This is not very nice.
As an even simpler thing, it may be possible to change the representation of events to delay term substitutions, storing a pair of a substitution and a summary. Computing the actual environment could be forced when converting out of lambda form, so that the linear cost of the summary -> env transformation is paid only once.

lpw25 · 2018-05-21T10:37:31Z

As an even simpler thing, it may be possible to change the representation of events to delay term substitutions, storing a pair of a substitution and a summary.

Slight variation: you could add a substitution constructor to the Env.summary type and just perform the substitution if/when it is converted back to an ordinary Env.t.

trefis · 2018-06-05T12:17:06Z

I'd still rather keep Env.ts in debug events in lambda, but I'd be OK with Leo's suggestion.

That said, I'm too shy to take a decision on this, so I welcome any opinion (one way or the other).

Introduced "Lambda.rename" for var->var substitutions, this also updates the debug envs automatically. This fixes MPR#7554

trefis · 2018-07-17T09:41:25Z

Apparently given enough time, shyness goes away: I rebased this morning, the CIs appear to pass, I'll merge as is later today.

gasche reviewed May 2, 2018

View reviewed changes

trefis force-pushed the pr7554 branch from 8185992 to f4ea01f Compare May 4, 2018 15:04

trefis changed the title ~~MPR#7554: have simplif refresh debug events environments when introducing new bindings~~ Lambda.subst: also update debug event environments May 4, 2018

gasche reviewed May 5, 2018

View reviewed changes

trefis force-pushed the pr7554 branch from 2801b57 to 16826ec Compare May 9, 2018 20:28

gasche approved these changes May 14, 2018

View reviewed changes

damiendoligez added the bug label May 16, 2018

trefis force-pushed the pr7554 branch from 16826ec to 0a0cd92 Compare June 25, 2018 15:49

trefis added 3 commits July 17, 2018 09:49

lambda_event contains actual env, not a summary

e6ce844

Lambda.subst: update debug events environments

bb3e89d

Introduced "Lambda.rename" for var->var substitutions, this also updates the debug envs automatically. This fixes MPR#7554

Changes

a7ce8de

trefis force-pushed the pr7554 branch from 0a0cd92 to a7ce8de Compare July 17, 2018 08:49

trefis merged commit f280343 into ocaml:trunk Jul 19, 2018

trefis deleted the pr7554 branch July 19, 2018 09:29

vicuna mentioned this pull request Mar 14, 2019

identifiers not recognized by ocaml when assigned by deconstructing a tuple #7554

Closed

EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024

Paginate Changelog entries (ocaml#1751)

cdf317b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lambda.subst: also update debug event environments #1751

Lambda.subst: also update debug event environments #1751

trefis commented May 1, 2018

gasche commented May 1, 2018

trefis commented May 2, 2018

gasche left a comment

gasche May 2, 2018

trefis May 3, 2018

gasche May 3, 2018

trefis May 3, 2018

trefis commented May 4, 2018

gasche left a comment •

edited

gasche May 5, 2018

trefis May 7, 2018

gasche May 5, 2018

trefis May 7, 2018

gasche May 5, 2018

trefis May 7, 2018

gasche May 5, 2018

gasche May 5, 2018

trefis commented May 7, 2018

trefis commented May 8, 2018

gasche commented May 8, 2018

trefis commented May 8, 2018

trefis commented May 14, 2018

gasche commented May 14, 2018

damiendoligez commented May 16, 2018

xavierleroy commented May 16, 2018

trefis commented May 17, 2018 •

edited

gasche commented May 21, 2018

lpw25 commented May 21, 2018

trefis commented Jun 5, 2018

trefis commented Jul 17, 2018

		@@ -839,21 +852,22 @@ let field_of_str loc str =


		let transl_store_structure glob map prims str =
		let no_env_update _ _ env = env in

Lambda.subst: also update debug event environments #1751

Lambda.subst: also update debug event environments #1751

Conversation

trefis commented May 1, 2018

Testing

Notes about the implementation

gasche commented May 1, 2018

trefis commented May 2, 2018

gasche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trefis commented May 4, 2018

gasche left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trefis commented May 7, 2018

trefis commented May 8, 2018

gasche commented May 8, 2018

trefis commented May 8, 2018

trefis commented May 14, 2018

gasche commented May 14, 2018

damiendoligez commented May 16, 2018

xavierleroy commented May 16, 2018

trefis commented May 17, 2018 • edited

gasche commented May 21, 2018

lpw25 commented May 21, 2018

trefis commented Jun 5, 2018

trefis commented Jul 17, 2018

gasche left a comment •

edited

trefis commented May 17, 2018 •

edited